229 research outputs found
Adaptive multimodal continuous ant colony optimization
Seeking multiple optima simultaneously, which multimodal optimization aims at, has attracted increasing attention but remains challenging. Taking advantage of ant colony optimization algorithms in preserving high diversity, this paper intends to extend ant colony optimization algorithms to deal with multimodal optimization. First, combined with current niching methods, an adaptive multimodal continuous ant colony optimization algorithm is introduced. In this algorithm, an adaptive parameter adjustment is developed, which takes the difference among niches into consideration. Second, to accelerate convergence, a differential evolution mutation operator is alternatively utilized to build base vectors for ants to construct new solutions. Then, to enhance the exploitation, a local search scheme based on Gaussian distribution is self-adaptively performed around the seeds of niches. Together, the proposed algorithm affords a good balance between exploration and exploitation. Extensive experiments on 20 widely used benchmark multimodal functions are conducted to investigate the influence of each algorithmic component and results are compared with several state-of-the-art multimodal algorithms and winners of competitions on multimodal optimization. These comparisons demonstrate the competitive efficiency and effectiveness of the proposed algorithm, especially in dealing with complex problems with high numbers of local optima
The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter
Large pre-trained transformers are show-stealer in modern-day deep learning,
and it becomes crucial to comprehend the parsimonious patterns that exist
within them as they grow in scale. With exploding parameter counts, Lottery
Ticket Hypothesis (LTH) and its variants, have lost their pragmatism in
sparsifying them due to high computation and memory bottleneck of the
repetitive train-prune-retrain routine of iterative magnitude pruning (IMP)
which worsens with increasing model size. In this paper, we comprehensively
study induced sparse patterns across multiple large pre-trained vision and
language transformers. We propose the existence of -- essential sparsity
defined with a sharp dropping point beyond which the performance declines much
faster w.r.t the rise of sparsity level, when we directly remove weights with
the smallest magnitudes in one-shot. In the sparsity-performance curve We also
present an intriguing emerging phenomenon of abrupt sparsification during the
pre-training of BERT, i.e., BERT suddenly becomes heavily sparse in
pre-training after certain iterations. Moreover, our observations also indicate
a counter-intuitive finding that BERT trained with a larger amount of
pre-training data tends to have a better ability to condense knowledge in
comparatively relatively fewer parameters. Lastly, we investigate the effect of
the pre-training loss on essential sparsity and discover that self-supervised
learning (SSL) objectives trigger stronger emergent sparsification properties
than supervised learning (SL). Our codes are available at
\url{https://github.com/VITA-Group/essential\_sparsity}
- …